Graphics using ggplot2 in R

In general, Leo suggests using a ggplot related book(s)/references (below). His suggestion was to look through the book (website is a little harder), and find plots you'd like to make. From there copy the code and edit for your use.

For basic graphics help, and a good place to start when learning graphics in R, see: https://r-graphics.org/


In [1]:
## Before working, the following line should be uncommented so that depended libraries are installed
#install.packages(c("ggplot2", "cowplot", "ggpubr", "GGally", "reshape", "plotly", "Polychrome"))

R Graphics Cookbook: Creating a Scatter Plot

To create a scatter plot use plot().

In [2]:
plot(mtcars$wt, mtcars$mpg) #mtcars is base data that comes with R

The mtcars$wt returns the column named wt from the mtcars data frame, and mtcars$mpg is the mpg column.

With ggplot2, you can get a similar result using the ggplot() function.

In [3]:
library(ggplot2) #This will only work is you've installed ggplot2!
x <- ggplot(mtcars, aes_string(x='wt', y='mpg')) + geom_point()
print(x)

ggplot() creates the plot object. geom_point() adds the layer of points to the plot.

Using ggplot() by passing a data frame and tell it which columns to use. The key difference between python and R here, is that R expects an object and not a string with aes.

ggplot(mtcar, aes(x=wt, y=mpg)) if you input a string this will not work, which is why I used aes_string. This also helps with loop input, which is often a string and not an object.


Leo assigned the ggplot() object to a variable x, and examined that variable. This was very innovatives.

In [4]:
#str(x) ## Commented out to shorten notebook

Cowplot for annotating ggplot objects


The cowplot package is a simple add-on to ggplot. It provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in my lab, to provide my students and postdocs with the tools to make high-quality figures for their publications. I have also used the package extensively in my book Fundamentals of Data Visualization. This introductory vignette provides a brief glance at the key features of the package. - Claus O Wilke

For more complete documentation, read:

In [5]:
library(cowplot)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + 
  geom_point()
********************************************************

Note: As of version 1.0.0, cowplot does not change the

  default ggplot2 theme anymore. To recover the previous

  behavior, execute:
  theme_set(theme_cowplot())

********************************************************


Generate a simple and clean theme with cowplot: theme_cowplot().

In [6]:
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + 
  geom_point() +
  theme_cowplot(12)

Another use of cowplot is to annotate and arrange plots. This is currently R specific.

Side note: This might be a good place to generate a python package to mirror cowplot function.

In [7]:
library(repr) # Adjust R kernal plot size
# Change plot size to 8 x 4
options(repr.plot.width=10, repr.plot.height=5)

p1 <- ggplot(mtcars, aes(disp, mpg)) + 
  geom_point() + theme_cowplot(12)
p2 <- ggplot(mtcars, aes(qsec, mpg)) +
  geom_point() + theme_cowplot(12)

plot_grid(p1, p2, labels = c('A', 'B'), label_size = 14)

GGally - Extension to ggplot()

  • GGally::ggcoef, plot the coefficients of a model
  • GGally::ggduo, display two grouped data in a plot matrix
  • GGally::ggmatrix, managing multiple plots in a matrix-like layout
  • GGally::ggnetworkmap, plotting elegant maps using ggplot
  • GGally::ggpairs, special form of a ggmatrix that produces a pairwise comparison of multivariate data

In [8]:
options(repr.plot.width=16, repr.plot.height=16)
data(tips, package = "reshape") # Only reason we installed `reshape` and not `reshape2`
pm <- GGally::ggpairs(tips)
pm
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggpair as part of the GGally function is useful for getting a visually idea of all the variables compared against each other.

ggpubr: ‘ggplot2’ Based Publication Ready Plots

The ‘ggpubr’ package provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.


Distribution

In [9]:
options(repr.plot.width=7, repr.plot.height=7)
library(ggpubr)

# Create some data format
# :::::::::::::::::::::::::::::::::::::::::::::::::::
set.seed(1234)
wdata = data.frame(
   sex = factor(rep(c("F", "M"), each=200)),
   weight = c(rnorm(200, 55), rnorm(200, 58)))
head(wdata, 4)

# Density plot with mean lines and marginal rug
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change outline and fill colors by groups ("sex")
# Use custom palette
ggdensity(wdata, x = "weight",
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#00AFBB", "#E7B800"))
Loading required package: magrittr


Attaching package: ‘ggpubr’


The following object is masked from ‘package:cowplot’:

    get_legend


A data.frame: 4 × 2
sexweight
<fct><dbl>
1F53.79293
2F55.27743
3F56.08444
4F52.65430
In [10]:
# Histogram plot with mean lines and marginal rug
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change outline and fill colors by groups ("sex")
# Use custom color palette
gghistogram(wdata, x = "weight",
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#00AFBB", "#E7B800"))
Warning message:
“Using `bins = 30` by default. Pick better value with the argument `bins`.”

Box plots and violin plots

In [11]:
# Load data
data("ToothGrowth")
df <- ToothGrowth
head(df, 4)

# Box plots with jittered points
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change outline colors by groups: dose
# Use custom color palette
# Add jitter points and change the shape by groups
 p <- ggboxplot(df, x = "dose", y = "len",
                color = "dose", palette =c("#00AFBB", "#E7B800", "#FC4E07"),
                add = "jitter", shape = "dose")
 p
A data.frame: 4 × 3
lensuppdose
<dbl><fct><dbl>
1 4.2VC0.5
211.5VC0.5
3 7.3VC0.5
4 5.8VC0.5
In [12]:
# Add p-values comparing groups
 # Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 50)                   # Add global p-value
Warning message in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, :
“cannot compute exact p-value with ties”
Warning message in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, :
“cannot compute exact p-value with ties”
Warning message in wilcox.test.default(c(16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, :
“cannot compute exact p-value with ties”
In [13]:
# Violin plots with box plots inside
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change fill color by groups: dose
# add boxplot with white fill color
ggviolin(df, x = "dose", y = "len", fill = "dose",
         palette = c("#00AFBB", "#E7B800", "#FC4E07"),
         add = "boxplot", add.params = list(fill = "white"))+
  stat_compare_means(comparisons = my_comparisons, label = "p.signif")+ # Add significance levels
  stat_compare_means(label.y = 50)
Warning message in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, :
“cannot compute exact p-value with ties”
Warning message in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, :
“cannot compute exact p-value with ties”
Warning message in wilcox.test.default(c(16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, :
“cannot compute exact p-value with ties”

Bar plots

In [14]:
# Load data
data("mtcars")
dfm <- mtcars
# Convert the cyl variable to a factor
dfm$cyl <- as.factor(dfm$cyl)
# Add the name colums
dfm$name <- rownames(dfm)
# Inspect the data
head(dfm[, c("name", "wt", "mpg", "cyl")])
A data.frame: 6 × 4
namewtmpgcyl
<chr><dbl><dbl><fct>
Mazda RX4Mazda RX4 2.62021.06
Mazda RX4 WagMazda RX4 Wag 2.87521.06
Datsun 710Datsun 710 2.32022.84
Hornet 4 DriveHornet 4 Drive 3.21521.46
Hornet SportaboutHornet Sportabout3.44018.78
ValiantValiant 3.46018.16
In [15]:
ggbarplot(dfm, x = "name", y = "mpg",
          fill = "cyl",               # change fill color by cyl
          color = "white",            # Set bar border colors to white
          palette = "jco",            # jco journal color palett. see ?ggpar
          sort.val = "desc",          # Sort the value in dscending order
          sort.by.groups = FALSE,     # Don't sort inside each group
          x.text.angle = 90           # Rotate vertically x axis texts
          )
In [16]:
ggbarplot(dfm, x = "name", y = "mpg",
          fill = "cyl",               # change fill color by cyl
          color = "white",            # Set bar border colors to white
          palette = "jco",            # jco journal color palett. see ?ggpar
          sort.val = "asc",           # Sort the value in dscending order
          sort.by.groups = TRUE,      # Sort inside each group
          x.text.angle = 90           # Rotate vertically x axis texts
          )
In [17]:
# Calculate the z-score of the mpg data
dfm$mpg_z <- (dfm$mpg -mean(dfm$mpg))/sd(dfm$mpg)
dfm$mpg_grp <- factor(ifelse(dfm$mpg_z < 0, "low", "high"), 
                     levels = c("low", "high"))
# Inspect the data
head(dfm[, c("name", "wt", "mpg", "mpg_z", "mpg_grp", "cyl")])
A data.frame: 6 × 6
namewtmpgmpg_zmpg_grpcyl
<chr><dbl><dbl><dbl><fct><fct>
Mazda RX4Mazda RX4 2.62021.0 0.1508848high6
Mazda RX4 WagMazda RX4 Wag 2.87521.0 0.1508848high6
Datsun 710Datsun 710 2.32022.8 0.4495434high4
Hornet 4 DriveHornet 4 Drive 3.21521.4 0.2172534high6
Hornet SportaboutHornet Sportabout3.44018.7-0.2307345low 8
ValiantValiant 3.46018.1-0.3302874low 6
In [18]:
ggbarplot(dfm, x = "name", y = "mpg_z",
          fill = "mpg_grp",           # change fill color by mpg_level
          color = "white",            # Set bar border colors to white
          palette = "jco",            # jco journal color palett. see ?ggpar
          sort.val = "asc",           # Sort the value in ascending order
          sort.by.groups = FALSE,     # Don't sort inside each group
          x.text.angle = 90,          # Rotate vertically x axis texts
          ylab = "MPG z-score",
          xlab = FALSE,
          legend.title = "MPG Group"
          )
In [19]:
ggbarplot(dfm, x = "name", y = "mpg_z",
          fill = "mpg_grp",           # change fill color by mpg_level
          color = "white",            # Set bar border colors to white
          palette = "jco",            # jco journal color palett. see ?ggpar
          sort.val = "desc",          # Sort the value in descending order
          sort.by.groups = FALSE,     # Don't sort inside each group
          x.text.angle = 90,          # Rotate vertically x axis texts
          ylab = "MPG z-score",
          legend.title = "MPG Group",
          rotate = TRUE,
          ggtheme = theme_minimal()
          )

Polychrome

Polychrome is a tool for creating, viewing, and assessing qualitative palettes with many (20-30 or more) colors. This is of importances due to the update in the color palettes in the new version of R.

Note: If currently writing a paper, try not to upgrade R. Otherwise, you might have to redo the colors.


In [20]:
library(Polychrome) # https://rdrr.io/cran/Polychrome/f/vignettes/polychrome.Rmd
mypal <- kelly.colors(22)
swatch(mypal)

Color blind palettes

Ren asked question about color blind palettes, Leo pointed out viridisLite. I normally just use viridis.


viridisLite

Matplotlib recently introduced new color maps for their graphs. They are called viridis, magma, inferno, and plasma. viridis was made the new default color map of Matplotlib.

NOTE: viridisLite is the 'lite' version of the more complete viridis package. viridisLite contains only the core functions of viridis that generate the color vectors for each of the aforementioned color maps. It does not have any of the other features of the full viridis package (e.g. scale functions for ggplot2). This was requested by users of viridis who did not want to have to import the dependencies of viridis but still wanted to be able to use the color maps it provides.

In [21]:
#install.packages("viridisLite")
library(viridisLite)
library(hexbin)
dat <- data.frame(x = rnorm(10000), y = rnorm(10000))
ggplot(dat, aes(x = x, y = y)) +
geom_hex() + coord_fixed() +
scale_fill_gradientn(colours = viridis(256, option = "D"))
In [22]:
# using code from RColorBrewer to demo the palette
n = 200
image(1:n, 1, as.matrix(1:n),
      col = viridis(n, option = "D"),
      xlab = "viridis n", ylab = "", xaxt = "n", yaxt = "n", bty = "n")

viridis

Use the color scales in this package to make plots that are pretty, better represent your data, easier to read by those with colorblindness, and print well in grey scale.

In [23]:
#install.packages("viridis")
library(viridis)

x <- y <- seq(-8*pi, 8*pi, len = 40)
r <- sqrt(outer(x^2, y^2, "+"))
filled.contour(cos(r^2)*exp(-r/(2*pi)), 
               axes=FALSE,
               color.palette=viridis,
               asp=1)
In [24]:
ggplot(data.frame(x = rnorm(10000), y = rnorm(10000)), aes(x = x, y = y)) +
  geom_hex() + coord_fixed() +
  scale_fill_viridis() + theme_bw()

plotly (interactive web graphics)


Notes:

  • this is also in python
  • also have a jupyter widget
In [25]:
library(plotly)
g <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon") + 
  xlim(1, 6) + ylim(40, 100)
ggplotly(g)
Attaching package: ‘plotly’


The following object is masked from ‘package:ggplot2’:

    last_plot


The following object is masked from ‘package:stats’:

    filter


The following object is masked from ‘package:graphics’:

    layout


Results from spatialLIBD of interactive plots

This seemed to be very computational expensive, so I have not added it to the notebook.

Reproducibility information

In [26]:
#install.packages('sessioninfo')
print('Reproducibility information:')
Sys.time()
proc.time()
options(width = 120)
sessioninfo::session_info()
[1] "Reproducibility information:"
[1] "2020-04-27 11:21:56 EDT"
   user  system elapsed 
 12.895   0.183  13.091 
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Arch Linux                  
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2020-04-27                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version  date       lib source        
 assertthat      0.2.1    2019-03-21 [1] CRAN (R 3.6.3)
 base64enc       0.1-3    2015-07-28 [1] CRAN (R 3.6.3)
 cli             2.0.2    2020-02-28 [1] CRAN (R 3.6.3)
 colorspace      1.4-1    2019-03-18 [1] CRAN (R 3.6.3)
 cowplot       * 1.0.0    2019-07-11 [1] CRAN (R 3.6.3)
 crayon          1.3.4    2017-09-16 [1] CRAN (R 3.6.3)
 crosstalk       1.1.0.1  2020-03-13 [1] CRAN (R 3.6.3)
 data.table      1.12.8   2019-12-09 [1] CRAN (R 3.6.3)
 digest          0.6.25   2020-02-23 [1] CRAN (R 3.6.3)
 dplyr           0.8.5    2020-03-07 [1] CRAN (R 3.6.3)
 evaluate        0.14     2019-05-28 [1] CRAN (R 3.6.3)
 fansi           0.4.1    2020-01-08 [1] CRAN (R 3.6.3)
 farver          2.0.3    2020-01-16 [1] CRAN (R 3.6.3)
 GGally          1.5.0    2020-03-25 [1] CRAN (R 3.6.3)
 ggplot2       * 3.3.0    2020-03-05 [1] CRAN (R 3.6.3)
 ggpubr        * 0.2.5    2020-02-13 [1] CRAN (R 3.6.3)
 ggsci           2.9      2018-05-14 [1] CRAN (R 3.6.3)
 ggsignif        0.6.0    2019-08-08 [1] CRAN (R 3.6.3)
 glue            1.3.2    2020-03-12 [1] CRAN (R 3.6.3)
 gridExtra       2.3      2017-09-09 [1] CRAN (R 3.6.3)
 gtable          0.3.0    2019-03-25 [1] CRAN (R 3.6.3)
 hexbin        * 1.28.1   2020-02-03 [1] CRAN (R 3.6.3)
 htmltools       0.4.0    2019-10-04 [1] CRAN (R 3.6.3)
 htmlwidgets     1.5.1    2019-10-08 [1] CRAN (R 3.6.3)
 httr            1.4.1    2019-08-05 [1] CRAN (R 3.6.3)
 IRdisplay       0.7.0    2018-11-29 [1] CRAN (R 3.6.3)
 IRkernel        1.1      2019-12-06 [1] CRAN (R 3.6.3)
 isoband         0.2.0    2019-04-06 [1] CRAN (R 3.6.3)
 jsonlite        1.6.1    2020-02-02 [1] CRAN (R 3.6.3)
 labeling        0.3      2014-08-23 [1] CRAN (R 3.6.3)
 lattice         0.20-38  2018-11-04 [2] CRAN (R 3.6.3)
 lazyeval        0.2.2    2019-03-15 [1] CRAN (R 3.6.3)
 lifecycle       0.2.0    2020-03-06 [1] CRAN (R 3.6.3)
 magrittr      * 1.5      2014-11-22 [1] CRAN (R 3.6.3)
 MASS            7.3-51.5 2019-12-20 [2] CRAN (R 3.6.3)
 munsell         0.5.0    2018-06-12 [1] CRAN (R 3.6.3)
 pbdZMQ          0.3-3    2018-05-05 [1] CRAN (R 3.6.3)
 pillar          1.4.3    2019-12-20 [1] CRAN (R 3.6.3)
 pkgconfig       2.0.3    2019-09-22 [1] CRAN (R 3.6.3)
 plotly        * 4.9.2.1  2020-04-04 [1] CRAN (R 3.6.3)
 plyr            1.8.6    2020-03-03 [1] CRAN (R 3.6.3)
 Polychrome    * 1.2.5    2020-03-29 [1] CRAN (R 3.6.3)
 purrr           0.3.3    2019-10-18 [1] CRAN (R 3.6.3)
 R6              2.4.1    2019-11-12 [1] CRAN (R 3.6.3)
 RColorBrewer    1.1-2    2014-12-07 [1] CRAN (R 3.6.3)
 Rcpp            1.0.4    2020-03-17 [1] CRAN (R 3.6.3)
 repr          * 1.1.0    2020-01-28 [1] CRAN (R 3.6.3)
 reshape         0.8.8    2018-10-23 [1] CRAN (R 3.6.3)
 rlang           0.4.5    2020-03-01 [1] CRAN (R 3.6.3)
 scales          1.1.0    2019-11-18 [1] CRAN (R 3.6.3)
 scatterplot3d   0.3-41   2018-03-14 [1] CRAN (R 3.6.3)
 sessioninfo     1.1.1    2018-11-05 [1] CRAN (R 3.6.3)
 tibble          2.1.3    2019-06-06 [1] CRAN (R 3.6.3)
 tidyr           1.0.2    2020-01-24 [1] CRAN (R 3.6.3)
 tidyselect      1.0.0    2020-01-27 [1] CRAN (R 3.6.3)
 uuid            0.1-4    2020-02-26 [1] CRAN (R 3.6.3)
 vctrs           0.2.4    2020-03-10 [1] CRAN (R 3.6.3)
 viridis       * 0.5.1    2018-03-29 [1] CRAN (R 3.6.3)
 viridisLite   * 0.3.0    2018-02-01 [1] CRAN (R 3.6.3)
 withr           2.1.2    2018-03-15 [1] CRAN (R 3.6.3)
 yaml            2.2.1    2020-02-01 [1] CRAN (R 3.6.3)

[1] /home/kj/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/lib/R/library